SegFormer: A Topic Segmentation Model with Controllable Range of Attention

نویسندگان

چکیده

Topic segmentation aims to reveal the latent structure of a document and divide it into multiple parts. However, current neural solutions are limited in context modeling sentences feature representation candidate boundaries. This causes model suffer from inefficient sentence encoding noise information interference. In this paper, we design new text SegFormer with unidirectional attention blocks better representations. To alleviate problem interference, uses novel additional aggregator topic classification loss guide aggregate within appropriate range. addition, applies an iterative prediction algorithm search for optimal boundaries progressively. We evaluate SegFormer's generalization ability, multilingual application ability on challenging real-world datasets. Experiments show that our significantly improves performance by 7.5% benchmark WIKI-SECTION compared several strong baselines. The dataset separate normal advertisement segments product marketing essays also achieves superior evaluation other cutting-edge models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Segmentation with a Structured Topic Model

We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s). Experimental results show that our model outpe...

متن کامل

Topic Segmentation with an Ordering-Based Topic Model

Documents from the same domain usually discuss similar topics in a similar order. However, the number of topics and the exact topics discussed in each individual document can vary. In this paper we present a simple topic model that uses generalised Mallows models and incomplete topic orderings to incorporate this ordering regularity into the probabilistic generative process of the new model. We...

متن کامل

A Hierarchical Bayesian Model for Topic Segmentation

Many streams of real-world data, such as conversations or body movements, consist of relatively coherent segments, each characterized by particular topics or controllers. Making sense of these data requires simultaneously segmenting the sequences and inferring the structure of the segments. We present a hierarchical Bayesian model that can be used to break a sequence of utterances or movements ...

متن کامل

A Dynamic Topic Model for Document Segmentation

Factor language models, like Latent Semantic Analysis, represent documents as mixtures of topics, and have a variety of applications. Normally, the mixture is computed at the whole-document level, that is, the entire document contains material on several topics, without specifying where they occur in the document. In this paper, we describe a new model which computes the topic mixture estimate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26477